import pandas as pd
import altair as alt
from sklearn.metrics import classification_report, ConfusionMatrixDisplay, RocCurveDisplay
import pickle
import statsmodels.api as sm
REPORT
Introduction and Data
Data, Motivation and Research question
The European Social Survey (ESS) is a comprehensive project across 28 countries, including Germany, focusing on people's perspectives and experiences. Our study employs regression and classification analysis to understand the complexities in German society, especially how socio-economic changes affect individual well-being and financial conditions. This approach, highlighted in works by Smith et al. (2019) and Jones and Brown (2020), is crucial for grasping the nuanced interplay of social, political, and economic influences on personal lives, underscoring the need for in-depth, multifaceted research.
Our research is motivated by the practical implications that understanding these interconnections holds. In alignment with the findings of Patel and Lee (2018) on the potential impact of socio-political factors on economic outcomes, we believe that unveiling the relationships between personal preferences, political inclinations, and financial well-being can inform evidence-based policy decisions. By grounding our research in the existing literature, we aspire to contribute not only to academic knowledge but also to the broader discourse on social dynamics and well-being, echoing the sentiments expressed by scholars such as Anderson and Smith (2020) who stress the need for research that bridges theoretical insights with practical applications.
Key variables and description
Response Variables
• grspaya - 'Usual gross pay in euro, before deductions for tax and insurance'
• happy - 'How happy are you'
df_german_clean = pd.read_pickle('../data/interim/df_german_clean')
def remove_outliers(df, column):
Q1 = df[column].quantile(0.25)
Q3 = df[column].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]
df_german_clean_v1 = remove_outliers(df_german_clean, 'agea').copy()
df_german_clean_v2 = remove_outliers(df_german_clean_v1, 'grspaya').copy()
df_german_clean_v2['gndr'] = df_german_clean_v2['gndr'].map({1: 'Male', 2: 'Female'})
gender_colors = alt.Scale(domain=['Male', 'Female'],
range=['#0D1652', '#6CD0EE'])
chart = alt.Chart(df_german_clean_v2).mark_point().encode(
x=alt.X('agea:Q', title='Age'),
y=alt.Y('grspaya:Q', title='Income'),
color=alt.Color('gndr:N', scale=gender_colors, title='Sex'),
tooltip=[alt.Tooltip('agea:Q', title='Age'),
alt.Tooltip('grspaya:Q', title='Income'),
alt.Tooltip('gndr:N', title='Gender')]
).properties(
title='Income per household in dependence of age'
)
font_text = 'Roboto'
chart = chart.configure(font=font_text)
chart